Section: New Results
Recognizing Human Actions Using RGB Sport Videos From the Web
Participants : Amir Nazemi, François Brémond.
keywords: Action Recognition, Activity Recognition, Video Summarization, Web Sport Videos, Golf Videos.
The aim of this work is to extract sport actions from a web sport streaming video and use them for highlight detection. The sport videos which is used in this research is Golf videos. The report explains 4 steps including the data preparation, methods selection and excremental results.
Data Preparation
Class names | Number of samples |
Tee shot + Geometrical Features | 73 |
Putt | 70 |
standing | 81 |
Methods | Accuracy on Golf Dataset |
LSTM + Geometrical Features | 91.5 % |
P-CNN | 97.32 % |
First, from a streaming video a dataset is built. This dataset contains 3 action classes such as Tee-shot, Putt and Standing. Table 8 shows the dataset description.
Framework
After preparing the dataset next step is to define the solutions for the problem. Since one of the main goal of this research is to provide a general solution for sport video then we proposed a solution based on the skeleton or human poses. Our proposed framework contains human pose detection, human tracking and action recognition respectively. For human pose detection we used a recent method named open-pose [105]. For human pose tracking we used a tracking method of Inria STARS SUP framework. Finally for action recognition we did some experiments for choosing the best method.
Methods selection
From different methods in the field of action recognition we selected the P-CNN [55] method which is the state of the art on some data-set. Additionally for having an alternative solution which is faster than P-CNN we proposed a method based on geometrical features of human poses. We used the geometrical features in a Long Short-Term Memory (LSTM) structure to characterize the second solution.
Experimental Results
Table 9 shows the results of selected methods on the prepared golf dataset. As it is illustrated in the table 9 the P-CNN method works better than a method with LSTM and geometrical features.